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Abstract 

Background: Actinobacteria of the genus Nocardia usually live in soil or water and play saprophytic roles, but they 
also opportunistically infect the respiratory system, skin, and other organs of humans and animals. Primarily because 
of the clinical importance of the strains, some Nocardia genomes have been sequenced, and genome sequences 
have accumulated. Genome sizes of Nocardia strains are similar to those of Streptomyces strains, the producers of 
most antibiotics. In the present work, we compared secondary metabolite biosynthesis gene clusters of type-l 
polyketide synthase (PKS-I) and nonribosomal peptide synthetase (NRPS) among genomes of representative 
Nocardia species/strains based on domain organization and amino acid sequence homology. 

Results: Draft genome sequences of Nocardia asteroides NBRC 1 553 1 T , Nocardia otitidiscaviarum IFM 1 1049, 
Nocardia brasiliensis NBRC 14402 T , and N. brasiliensis IFM 10847 were read and compared with published complete 
genome sequences of Nocardia farcinica IFM 10152, Nocardia cyriacigeorgica GUH-2, and N. brasiliensis HUJEG-1. 
Genome sizes are as follows: N. farcinica, 6.0 Mb; N. cyriacigeorgica, 6.2 Mb; N. asteroides, 7.0 Mb; N. otitidiscaviarum, 
7.8 Mb; and N. brasiliensis, 8.9 - 9.4 Mb. Predicted numbers of PKS-I, NRPS, and PKS-I/NRPS hybrid clusters ranged 
between 4-1 1, 7-13, and 1-6, respectively, depending on strains, and tended to increase with increasing genome 
size. Domain and module structures of representative or unique clusters are discussed in the text. 

Conclusion: We conclude the following: 1) genomes of Nocardia strains carry as many PKS-I and NRPS gene 
clusters as those of Streptomyces strains, 2) the number of PKS-I and NRPS gene clusters in Nocardia strains varies 
substantially depending on species, and N. brasiliensis strains carry the largest numbers of clusters among the 
species studied, 3) the seven Nocardia strains studied in the present work have seven common PKS-I and/or NRPS 
clusters, some of whose products are yet to be studied, and 4) different N. brasiliensis strains have some different 
gene clusters of PKS-I/NRPS, although the rest of the clusters are common within the N. brasiliensis strains. Genome 
sequencing suggested that Nocardia strains are highly promising resources in the search of novel secondary 
metabolites. 
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Background 

Actinomycetous strains of the genus Nocardia usually 
live in soil or water and play saprophytic roles in the en- 
vironment, but also are opportunistic human pathogens, 
infecting the respiratory tract, skin, brain, and other or- 
gans of both immunocompromised and immunocompe- 
tent patients. To date, more than 80 species have been 
established in the genus Nocardia, and approximately 
one-third to one-half of the species have been reported 
as human pathogens [1-3]. Because of their medical im- 
portance, Nocardia strains have accumulated in micro- 
bial collections as a resource for clinical and scientific 
studies in the last few decades (e.g., [4-7]). 

Although Nocardia strains belong to the Order Actino- 
mycetales together with Streptomyces strains, the latter be- 
ing known as a rich resource for discovery of secondary 
metabolites, few studies have been focused on secondary 
metabolites and their synthetic genes in Nocardia strains. 

Type I polyketide synthase (PKS-I) and nonribosomal 
peptide synthetase (NRPS) gene clusters are two of the 
major secondary metabolite-producing clusters in bacteria 
and are involved in the biosynthesis of polyketide chains 
and nonribosomal peptides, respectively. It has been 
found that these clusters produce several medically and 
industrially important compounds, such as pathogenic 
factors, avermectin, erythromycin, and vancomycin. 

In the present paper, we searched for PKS-I and NRPS 
genes in the genomes of representative Nocardia strains 
and analyzed their sequence similarities and differences 
in domain/module structures. While we were sequen- 
cing and analyzing Nocardia draft genomes, two new 
Nocardia genomes of N. cyriacigeorgica GUH-2 [8,9] 
and N. brasiliensis HUJEG-1 [10] were published. We 
included them in the present analysis together with N. 
farcinica genome, which our group has published previ- 
ously [11]. 

Methods 

Strains 

N. otitidiscaviarum IFM 11049 and N. brasiliensis IFM 
10847 were from the IFM culture collections of MMRC, 
Chiba University, Japan [12]. N. asteroides NBRC 15531 T 
and N. brasiliensis NBRC 14402 T were from the NBRC 
culture collection [5]. Cells were cultured in brain heart 
infusion liquid culture medium (Difco) in the conven- 
tional manner. 

Acquisition of whole-genome sequences 

Genomic DNA of N. otitidiscaviarum IFM 11049, N. 
brasiliensis (IFM 10847, NBRC 14402 T ), and N asteroides 
NBRC 15531 T was prepared as described previously [13]. 
Genome sequences were read by the pyrosequencing 
method using genome sequencer GS FLX Instruments 
and GS FLX Titanium Kits (Roche Applied Science, 



Japan). The read redundancy for the four draft genomes 
ranged between 55 and 104. We assembled the sequence 
reads of N. otitidiscaviarum IFM 11049, N. brasiliensis 
IFM 10847, N. brasiliensis NBRC 14402 T , and N. aster- 
oides NBRC 15531 T , and obtained 65, 223, 115, and 39 
contigs, which were longer than 500 bp. The estimated 
genome sizes of N. otitidiscaviarum IFM 1 1049, N. brasi- 
liensis IFM 10847, N. brasiliensis NBRC 14402 T , and 
N asteroides NBRC 15531 T were 7.9 Mb, 9.2 Mb, 8.9 Mb, 
and 7.0 Mb, respectively. The draft genome sequences of 
N. otitidiscaviarum IFM 11049, N. brasiliensis (IFM 
10847, NBRC 14402 T ), and N. asteroides NBRC 15531 T 
are available at GenBank/EMBL/DDBJ under the accession 
numbers BATZ01000001-BATZ01000065, BAUA01000001- 
BAUA01000223, BAFT01000001-BAFT01000128, and 
BAFO01000001-BAFO01000049, respectively. The complete 
genome sequences of N. cyriacigeorgica GUH-2, N. brasilien- 
sis HUJEG-1 (=ATCC 700358), and N. farcinica IFM 10152 
were downloaded from DDBJ [14], with accession numbers 
FO082843, CP0033876, and AP006618, respectively. 

Analysis of PKS-I and NRPS gene clusters 

The assembled contig sequences were submitted to the 
auto-annotation pipeline MiGAP [15,16] at DDBJ as 
described previously [17]. Assigned ORFs were further 
searched for signature domains of PKS-I and NRPS genes 
using the InterPro domain database [18,19]. ORFs hav- 
ing ketosynthase (KS) domain (IPR014030, IPR014031, 
IPR020841) or condensation (C) domain (IPR001242) 
were identified, and their adjacent genes were further ana- 
lyzed as PKS-I and NRPS gene candidates. Module orga- 
nizations were determined manually based on search 
results using InterPro database, results using PKS/NRPS 
analysis website [20], and signature sequences deduced 
using MOTIF search [21]. We also used antiSMASH 
[22,23], a website for antibiotics and secondary metabolite 
analysis, for finding orthologous clusters and predicting 
substrates for adenylation domains. PKS-I and NRPS gene 
clusters of N. farcinica IFM 10152, N. cyriacigeorgica 
GUH-2, and N. brasiliensis HUJEG-1 were also identified 
using the N. farcinica genomic database [11,24]. We as- 
sumed that two or more PKS-I and/or NRPS genes that 
were adjacent to each other constitute one cluster for 
secondary metabolite production (See Additional file 1: 
Table SI, for details and exceptions). We also assumed 
that one multi-domain PKS-I or NRPS gene that was not 
accompanied by adjacent PKS-I/NRPS genes constitute 
one independent cluster. However, genes having only a 
single PKS-I or NRPS domain were excluded from the 
present analysis because we considered them atypical, and 
focused on multi-domain clusters. The contig sequences 
containing PKS-I and NRPS gene clusters are available 
at GenBank/EMBL/DDBJ under the following accession 
numbers: [AB700569 - AB700587] (N. otitidiscaviarum 
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IFM 11049), [AB701575 - AB701605] (AS brasiliensis IFM 
10847), [AB701607 - AB701636] (A/, brasiliensis NBRC 
14402 T ), and [AB685274], [AB700124 - AB700133], 
[AB700557 - AB700568] (AS asteroides NBRC 15531 T ). 

Search for orthologous gene clusters among species and 
strains 

BLASTP search was performed using the NCBI Protein 
BLAST program against the non-redundant protein se- 
quence database [25,26]. We considered Nocardia genes 
homologous to other genes when they have more than 
70% sequence similarity in BLASTP search, and also 
when their domain organizations have high similarity. 
We also compared clusters with domain organizations 
that only partially match each other, as described in 
the text. 

Results and discussion 

The two leftmost columns in Table 1 list Nocardia strains 
studied in the present paper and their exact (complete 
genome) or estimated (draft) genome sizes. The genome 
sizes ranged between 6.0 and 9.4 Mb, similar to those of 
representative Streptomyces strains (5.0 - 11.9 Mb), the 
most abundant sources of secondary metabolites [27-29]. 
The fourth column indicates that the strains are from dif- 
ferent clinical origins. 

Figure 1 illustrates phylogenetic positions of the five 
Nocardia species (seven strains) studied in the present 
paper among 78 other established Nocardia strains. It 
also includes Streptomyces coelicolor and Mycobacterium 



tuberculosis for comparison. Four out of the five species 
are located in different clades of the 16S rRNA phylo- 
genetic tree, indicating that the present analysis is based 
on information from a wide range of Nocardia species. 
AS asteroides and AS cyriacigeorgica are in the same 
clade. We also included three strains from AS brasiliensis 
to elucidate intra-species variations. 

PKS-I, NRPS, and PKS-I/NRPS hybrid gene clusters 
from the Nocardia strains were predicted as described in 
Methods. Numbers of the three different types of clus- 
ters and the total number of clusters in each strain are 
listed in the four rightmost columns in Table 1. Among 
the seven strains, the numbers of PKS-I, NRPS, PKS-I/ 
NRPS hybrid clusters, and their total number increased 
proportionally to the genome size, except for AS otitidis- 
caviarum and AS asteroides (Table 1) as reported in other 
genera [35]. AS farcinica had the least while AS brasiliensis 
HUJEG-1 had the highest number of the gene clusters. 
The total number of clusters within the three AS brasilien- 
sis genomes differed (27 to 30), suggesting that different 
strains of the same species potentially produce their own 
unique products (see below). 

We also counted the numbers of type-II PKS, type-Ill 
PKS and terpene synthesis clusters in each genome. The 
numbers ranged between 0 and 3, except in AS otitidis- 
caviarum and AS brasiliensis strains, which have five and 
eight clusters for terpene synthesis per genome, respect- 
ively. In the present paper, however, we focused on PKS-I 
and NRPS secondary metabolite clusters because their 
products usually have larger molecular weights with more 



Table 1 Genome sizes and numbers of PKS-I, NRPS, and PKS-I/NRPS hybrid gene clusters in Nocardia strains 



Strain name 


Genome 
size* (Mb) 


State of 
sequence 


Source 


Number of gene clusters 
(Average number of genes/modules per cluster) 

PKS-I NRPS PKS-I/NRPS Total number 
hybrid of clusters 


N. farcinica IFM 10152 [11] 


6.0 


Complete 


Clinical (human sputum) 


4 


7 


1 


12 










(1.0/1.0) 1 


(1.6/7.7) 


(6.0/5.0) 


(1.8/5.6) 


N. cyriacigeorgica GUH-2 [8,9] 


6.2 


Complete 


Clinical (human kidney) 


5 


9 


1 


15 










(1.2/1.0) 


(1.4/5.8) 


(8.0/8.0) 


(1.8/4.3) 


N. asteroides NBRC 15531 T [1,30] 


7.0 


Draft 


Clinical (fatal brain abscess) 


7 


13 


2 


22 










(1.4/2.6) 


(1.2/4.5) 


(4.0/4.0) 


(1.5/3.9) 


N. otitidiscaviarum IFM 1 1049 [31] 


7.8 


Draft 


Clinical (human sputum) 


4 


12 


2 


18 










(1.0/1.0) 


(1.2/3.3) 


(4.0/11.0) 


(1.5/3.6) 


N. brasiliensis NBRC 14402 T [32,33] 


8.9 


Draft 


Clinical (leg lesion) 


10 


13 


4 


27 










(1.4/1.1) 


(1.6/5.0) 


(5.5/4.3) 


(2.1/3.4) 


N. brasiliensis IFM 10847 [12] 


9.2 


Draft 


Clinical (human pus) 


11 


13 


6 


30 










(1.5/1.1) 


(1.5/5.1) 


(4.8/4.3) 


(2.2 /3.5) 


N. brasiliensis HUJEG-1 [10] 


9.4 


Complete 


Clinical (human mycetoma) 


11 


13 


6 


30 










(1.4/1.2) 


(1.3/4.5) 


(4.8/4.3) 


(2.0/3.2) 



*ln draft genomes, genome sizes are estimated values. ^Average numbers of genes/modules per cluster are indicated in parenthesis. 
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Mycobacterium tuberculosis 
Streptomyces coelicolor 



Figure 1 Phylogenetic positions of Nocardia strains studied in the present work. The phylogenetic tree was constructed using 16S rRNA 
gene sequences of type strains (http://www.bacterio.cict.fr/, http://www.ncbi.nlm.nih.gov/nuccore). MEGA5 software [34] was used to draw 
non-rooted neighbor-noining phylogenic tree. Positional information of Mycobacterium tuberculosis (Genbank accession # X58890) and Streptomyces 
coelicolor (#AB184196) was added to the tree. Bootstrap values of 1000 re-samplings are shown only for the main branches. The five species studied in 
the present work were marked with red arrows. 



complex chemical structures than the others and have 
unique pharmacological activities. 

Figure 2 shows all the clusters found in each genome. 
Presumptive orthologous clusters, as defined in Methods, 
are aligned in the same row of the table. The rightmost 
column shows secondary metabolites referred from the 
database, e.g., [23], and also those inferred using the tools 
described in Methods. 

Clusters common among the seven strains 

Figure 2 suggests that seven presumable products (lines 
#1, #2, #4, #5, #25, #27, and #35) are common among the 
seven strains belonging to the five species. It is noteworthy 
that clusters #1, #2, #4, and #5 reside close to the original 
points of replication (ori.) in the three species whose com- 
pleted genome sequences are known, in accordance with a 
report showing that conserved genes reside in the internal 
core region of actinomycete genomes [27] . 

Mycotic acid (pksl3) 

We predicted that the products of the PKST genes in line 
#1 were mycolic acids, cell wall components in members 



belonging to Corynebacterineae, because these PKS-Is 
showed the same domain organization as those of pksYi 
in Mycobacterium tuberculosis for the synthesis of mycolic 
acids [36], and also showed over 80% sequence similar- 
ities to PKS-Is of N. farcinica annotated for mycolic acid 
synthesis [24]. 

Poly-lysine (Pis) 

NRPS genes of line #2 were predicted to be for poly- 
lysine synthesis because their module organizations are 
the same as that of poly-lysine synthetase (Pis) in Strep- 
tomyces albulus [37]. The corresponding gene, nfa3790, is 
also annotated as a Pis homolog in the N. farcinica data- 
base [24] . The sequence identity between Pis in S. albulus 
and nfa3790 in N. farcinica was 55%. 

Ser/The-rich nonribosomal peptides 

NRPS gene clusters in line #4 were present in all strains 
examined, but only a partial sequence was found in 
N. otitidiscaviarum (Additional file 2: Figure S1A). 
Zoropogui et al, [9] has suggested that #4 in N. cyriaci- 
georgica was for synthesis of 2-amino-9,10 epoxi-8- 
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r <, 



# 


synthase 
type 


N. farciica 
IFM 10152 


N. 

cyriacigeorgic 
GUH-2 


Af. aSlBfOluQS 
NBRC 155§1 


N. otitidtscaviarurr 
IFM 1 1049 


N. brasiliensts 


putative product § 


NBRC 14452 


IFM 10847 


HUJEG-1 


nfa 


NOCYR 


NCAST 


NO TIT 


NBRGN 


NBRGI 


03 1 


1 


pks 


1890t 


145 


25_03140 


34_00620 


027_03110 


042_00590 


755 


mycolic acid 


2 


nrps 


3790 


388 


25_01210 


11_06890 


027_00550 


073_00150 


2055 


poly-lysine 


3 


nrps 










027 00030 


158 00030 


2325 




4 


nrps 


7170-7200 


751 - 753 


11_00880 - 11_00850 


41_01750 


05600550, 
112 01590-112 01580 


221_00020, 
160 00010- 160 00020 


4080 - 4090 


Ser/Thr-rich 11aa peptide (or 4- 
7aa peptide & 7-5aa peptide) 


5 


hybrid 


7630 - 7680 


792 - 801 


11_00460 - 
11_00420, 13_1740 


03_00430, 
41_00480 


112_01110- 112_01030 


009_00460 - 009JJ0380 


4310-4350 


nocobactin-related siderophore 


6 













110_00930 


170_00010 


7355 




7 


nrps 




1701 














8 


pks 




2012-2013 














9 


hybrid 












008 00110 - 008 00060 


15055 - 15030 




10 


hybrid 












149 00160 - 149 00170 


15610 




1 1 








32 00390 


58 01020 




087J0780 


16575 


ol unsaturated fattv acid 

po yunsa ura e — anyaci 




hybrid 










032_00450 - 032_00420 


006_00860 - 006_00830 


17000 - 17015 




13 


_pks 


30250 


2493 


10 00730 




060 01820 


165 00260 


18640 


m cocerosic acid? 

mycoc ro ic aci 












20 02870 


060 01510 


215 00300 - 215 00280 


18810 




15 


"^iT 












060_01120 


113_00110 


18990 






El 










060 00540-060 00550 


047 00080 - 047 00090 


19285- 19280 




-]7 


—2- 










061_00070-061_00010 


016_00070 - 016_ 00060 


21160-21155 
















061 _00440 - 061 _00430 


107_00450 - 107_00440 


21345 - 20340 


Thr-x-x-x-Leu-x-x-Val-l_eu 


10 
















22445 - 22425 




— - 


hybrid 








A1 fl71in A7 (\7Q~7n 
HI U/ I I U " *rf \3ttLl\J 


024 00900 - 024 00930 


040 00090 - 040 00140 


22780 - 22800 






21 


s 










002 00020 - 002 001 70, 
080 00510 


098_00010 


22700 


(Thr-x-)x-x-Leu-x-x-x 


22 


pks 










024 00320 


208 00160 


23120 


enediyne {see Text) 


23 


hybrid 










072 00110-072 00190 


075 00280 - 075 00370 


25165-25205 




24 


nrps 




2584 - 2585 














25 


nrps 


31170 


3282 


32_02950 


21_03020 


068_01480 


081J0200 


26465 




26 


pks 










037 00370 - 037 00390 


185 00520 - 185 00510 


29065 - 29060 




27 
28 


nrps 


11050 


4062 


05 01440 


27 00710 


016 00570 


090 00180 


32155 




nrps 






05_01330 












29 


nrps 


27950 
















30 


pks 


43240 


3144 




20 01100 






32485 


mycocerosate 


31 


nrps 






37 01260 




UoJ UUUbU 


uuyau 






32 


nrps 










023 00700-023 00710 


080 00030 - 080 00020 


34385 


x 


33 


nrps 








11_01800 


058_00040 


077_00430 


34810 


X 


34 


nrps 




4098 












X 


35 
36 


nrps 


50330 


4800 


20_04250 


28_00280 


057_00530 


082_00560 


37910 


Ser-rich 12aa peptide 


nrps 


50630 - 50620 


4844 - 4843 


20_03980 


64_00560 - 64_00570 








Ser-rich 13aa peptide 


37 


pks 


55930 


5485 






065 00920 - 065 00930 


209 00490 - 209 00460 






38 


nrps 






32 06720 


48 00420 










39 


nrps 






37 01040 


47 10160 




















HI U04DU 










41 


nrps 








03 00010 










42 


nrps 








24 01940-24 01930 










43 


pks 






24_01760 












44 


pks 






21_01670 












45 


pks 






08_00240 












46 


pks 






33_02270 - 33_02300 










see Fig. 2 


47 


nrps 






05_01800 










X-X 


48 


nrps 






05_04970 












49 


nrps 






32_04540 










X 


50 


hybrid 






30_00340 - 30_00330 












51 


pks 










066 00620 


069_00380 






054_01060 


52 


pks 










015 00750 







*, nfa, NOCYR, NOTIT, NCAST, NBRGN, NBRGL, and 031 under the strain names are prefix of genes for each strain, and the numbers beneeth them are gene numbers. 



Figure 2 PKS-I, NRPS, and PKS-I/NRPS hybrid gene clusters identified in genome sequences of Nocardia strains. 



oxodecanoic acid, a component of HC-toxin, which is pro- 
duced by a plant pathogen causing corn leaf spots 
(reviewed in [38]). However, we suggest two other possibil- 
ities based on the domain organization of the cluster. One 
is that the intact #4 cluster is for synthesis of a serine/ 
threonine-rich peptide composed of 11 amino acids. The 
second possibility is that the same sequence consists of 
two different NRPS clusters, since two thioesterase do- 
mains are present within the sequence, and accordingly, 
produces two peptide chains. In the latter case, the prod- 
ucts of cluster #4 in N. asteroides would be two peptides: 
one composed of four amino acids (NCAST 11 00880 & 
NCAST_11_00870) and the other composed of seven 



amino acids (NCAST_11_00860 & NCAST_11_00850). 
Likewise, in N. brasiliensts, the products of cluster #4 
would also be two peptides: one composed of six amino 
acids (O3I_004080 & O3I_004085) and the other com- 
posed of five amino acids (O3I_004090) (Figure 2, #4, 
rightmost column). 

A possible reason why N. otitidiscaviarum, unlike the 
other species, has only a partial #4 cluster is that this 
species is phylogenetically distant from the other strains, 
as shown in Figure 1. The relationships among the prod- 
ucts of these gene clusters, the phylogenetic positions of 
the strains, and their pathogenicity to plants and animals 
are interesting issues to clarify. 
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NRPS genes in line #35 were also present in all the 
strains examined, although those of N. otitidiscaviarum 
and N. brasiliensis IFM 10847 were partial compared 
with those of the other strains (Additional file 2: Figure 
SIB). The genes in five strains, except for N. otitidisca- 
viarum and N. brasiliensis IFM 10847, have 12 modules, 
and many of their adenylation domains were predicted 
to select Ser as the substrate. Hence, we assumed the 
products would be Ser-rich 12 aa peptides. Besides the 
clusters common among the seven strains in line #35, 
similar NRPS clusters were also present in the adjacent 
clusters of line #36, but only in four species excluding 
N. brasiliensis strains. Interestingly, the predicted prod- 
ucts are Ser-rich 13 aa peptides. 

Although the conservation of Ser- (and Thr-) rich large 
peptides synthesized by clusters of #4, #35, and #36 in 
the Nocardia strains suggests that they have import- 
ant roles, physiological roles of the products remain to 
be investigated. 

Nocobactin (nbt)-like siderophore 

The PKS-I/NRPS hybrid cluster #5 in N. farcinica pro- 
duces nocobactin, a siderophore and a pathogenic factor, 
as proven by Hoshino et al., [39]. Figure 3 compares 
clusters in line #5, which are candidates of nocobactin- 
like siderophore-producing genes. N. asteroides has a 
full set of genes required for siderophore synthesis 
(NCAST_11_00460 through NCAST_11_00420 and 
NCAST_13_1740 in #5). NCAST_13_1740 contains an 
nbtF-Yike gene [39], but is found separated from the rest 
of nbt genes in contig 11 by more than 180 Mb based on 



the contig sequences. In N. brasiliensis, the corresponding 
cluster structure is different from that in N. farcinica in 
terms of module number and domain organization, sug- 
gesting that a functionally similar but structurally different 
molecule is synthesized in N. brasiliensis. In N. otitidisca- 
viarum, cluster #5 lacks genes corresponding to nbtA-C 
and nbtE, which are required for nocobactin synthesis in 
N. farcinica [39], suggesting that cluster #5 in N. otitidis- 
caviarum may not be able to produce a nocobactin-like 
siderophore. Interestingly, however, nbtD- and nbtF-\ike 
genes, which both have 75% similarities to nbtD and nbtF 
genes in N. farcinica, respectively, are found in the middle 
of two different contigs in the N. otitidiscaviarum genome 
(contigs 3 and 41, respectively; listed in cluster #5 in 
Figure 2; see also Figure 3), suggesting gene loss/gain 
and recombination during evolution within the genus 
Nocardia. Such gene loss/gain is not only observed in the 
nbt-like gene cluster #5, but also found in nfa7170-7200 
homologs (#4), nfa50330 homologs (cluster #35), and 
nfa50630-50620 homologs (#36), as shown in Additional 
file 2 (Figure SI). It is further noteworthy that N. otitidis- 
caviarum has another candidate gene cluster for sidero- 
phore synthesis, which is shown in line #40 of Figure 2. 
The domain and module organization of this cluster are 
shown in Figure 3F, which shows the differences from the 
one for nocobactin synthesis. 

Other common gene clusters 

Two NRPS genes in lines #25 and #27 (Figure 2) were 
common in all the strains sequenced, having four and 
three modules, respectively. However, the amino-acid 
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Figure 3 Comparison of nfaf-like, siderophore-synthesizing gene clusters (Figure 2 #20) among Nocardia species. N. farcinica (A), 
N. otitidiscaviarum (B), N. asteroides (C), N. brasiiiensis NBRC 14402 T (D), and N. brasiliensis IFM 10847 (E). An nbff-like gene was located 
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F. Domains and module structures of a putative siderophore synthetic gene cluster (line #40 in Figure 2) found only in N. otitidiscaviarum 
IFM 1 1049. Putative gene functions were inferred by BLASTP search [26] and MOTIF search [21] and are indicated in the figure. 
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composition of the products could not be predicted in 
silico using antiSMASH. Chemical structures and physio- 
logical roles of the products remain to be investigated. 

Clusters missing in a few strains but found in others 
Mycocerosic acid 

PKS-I clusters of line #13 (Figure 2) were present in the 
six strains except N. otitidiscaviarum. Other PKS-I clus- 
ters in line #30 were present in the four strains but not 
in N. asteroides and in the two N. brasiliensis strains 
(NBRC 14402 T , IFM 10847). Nfa30250 (Figure 2 #13) 
has been predicted to be involved in mycocerosic acid 
synthesis in the N. farcinica genome project [24]. On the 
other hand, O3I_032485 in N. brasiliensis HUJEG-1 
(Figure 2 #30) has been putatively annotated as myco- 
cerosate synthase [10]. Both PKS-Is in N. farcinica 
(#13) and in N. brasiliensis (#30) showed approxi- 
mately 47% amino acid similarities to Mycobacterium 
tuberculosis PKS-I [GenBank/EMBL/DDBJ accession 
number: CCP46654] for the synthesis of mycocerosic 
acid, a pathogenic factor [40,41]. All the PKS-Is listed 
in line #13 and #30 showed sequence similarities ran- 
ging between 62 and 75%, having almost the same protein 
length (approximately 2200 aa) and identical domain 
organization (KS/AT/DH/KR/ER/ACP) (Additional file 1: 
Table SI). 

It is possible that in N. otitidiscaviarum, cluster #30 is 
a substitute for cluster #13. Interestingly, N. farcinica 
and N. cyriacigeorgica possess both #13 and #30, which 
could be related to the strong pathogenicity of the two 
strains. 

Polyunsaturated fatty acid (pfaA) 

Orthologous genes of cluster #11 are found in N, aster- 
oides, N. otitidiscaviarum, and N. brasiliensis IFM 10487 
and HUJEG-1, but not in N. farcinica, N. cyriacigeorgica, 
and N. brasiliensis NBRC 14402 (Figure 2). The PKS-I 
sequence identities among N. asteroides, N. otitidisca- 
viarum, and N. brasiliensis ranged from 66 to 78%. The 
modular structures of PKS-Is in line #11 (KS/AT/ACP/ 
ACP/KR) are unusual because they have two tandem 
ACP domains, and one KR domain is located after ACP 
(Additional file 3: Figure S2). These features have been 
known in polyunsaturated fatty acid (PUFA) synthase, 
PfaA, of marine bacteria, such as those belonging to the 
genus Shewanella [42,43]. The module organizations are 
similar between #11 and PfaA (Figure S2), and their 
amino-acid sequence similarities are over 50%. Hence, 
we predict that the products of PKS-I in line #11 are 
polyunsaturated fatty acids, as already reported in N. 
brasiliensis HUJEG-1 genome [10]. However, reports of 
production of polyunsaturated fatty acids have been 
limited only in some psychrophilic, piezophilic, or halo- 
philic bacteria in prokaryotes [44,45] and have not been 



found in Nocardia strains. Thus, future chemical and syn- 
thetic analysis are required to explore the potential of 
Nocardia strains as industrial producers of these pharma- 
ceutically and nutraceutically valuable compounds. 

Species-specific clusters 

Only a few species-specific clusters are found in N. farci- 
nica (cluster #29), N, cyriacigeorgica (#7, #8, #24, #34), 
and N. otitidiscaviarum (#40, #41, #42), suggesting that 
the evolution of their polyketides and nonribosomal me- 
tabolites are not as dynamic as in other species (Figure 2). 
On the other hand, N, asteroides has nine species-specific 
clusters (PKS-I, #43 - #46; NRPS, #28, #47 - #49; PKS-I/ 
NRPS hybrid, #50). Among them, we selected #46 as a 
representative unique cluster structure in N. asteroides 
(Figure 4A). The cluster consists of four adjacent genes, 
namely, NCAST33_02270, _02280, _02290, and _02300, 
each of which has the top BLAST-hit to a gene sequence 
in different Streptomyces strains (see Additional file 1: 
Table SI, for more details). The cluster consisted of twelve 
modules and was larger than any other cluster in the 
Nocardia strains studied in this paper. The orders of do- 
mains within the modules follow the accepted theory 
for PKS-I gene cluster structures, i.e., having repeats 
of "KS-AT-optional domains-ACP" [46,47]. The same 
module organization, however, could not be found in 
available public databases, and the cluster has an amino- 
acid sequence similarity of less than 53% to any other 
known PKS-I clusters. This suggests that the gene cluster 
may be involved in production of a novel secondary 
metabolite. We propose a chemical structure of the 
polyketide chain synthesized by this cluster as shown 
in Figure 4B based on the PKS-I assembly line rule 
[46] as follows. The presence of a ketosynthaseQ (KSQ) do- 
main at the N terminus of NCAST 33 02270 and a thioes- 
terase (Te) domain at the C terminus of NCAST 33 02300 
indicates that NCAST_33_02270 and NCAST_33_02300 
contain the modules that initiate and terminate PKS-I as- 
sembly line, respectively. Among the eleven AT domains 
of module 1 (ml) to 11 (mil), ten AT domains, except 
that in m8, had the HAFHS signature amino-acid se- 
quence, which is specific for malonyl-CoA in substrate 
recognition [48,49]. The substrate of the AT domain in 
m8, which has IASHS amino-acid sequence, and the 
starter molecules loaded on LM could not be predicted 
using bioinformatic approach. Hence, this polyketide 
backbone is predicted to be CX-C2-C2-C2-C2-C2-C2-O2- 
Cy-C2-C 2 -C2, where C 2 is a unit derived from malonyl- 
CoA, and Cx and Cy are carbon backbones derived from 
presently unknown substrates. An inactive DH (dh)-KR 
pair, which is responsible for formation of a hydroxyl 
group, is present as an optional domain in ml, m3, and 
m4 modules, while the optional domain present in m2 is a 
DH-ER-KR trio, which completely reduces the ketone 
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Figure 4 A representative example of PKS-I gene cluster (#46 in Figure 2) (A), and its putative product (B). The cluster is specific to 
N. asteroides NBRC 15531 T . A. Domain organization. LM, loading module; ml - ml 1, modules; KS, ketosynthase; AT, acyltransferase; ACP, acyl 
carrier protein; DH, dehydratase; KR, ketoreductase; ER, enoyl-reductase domains. Gray 'DH" domains are probably inactive. B. Chemical structure 
of intermediate polyketide chain predicted from the gene cluster #46 based on assembly line rule [46,49], 



residue formed by the m2 module. Optional domains of 
m5 through mil are DH-KR pairs, which form double 
bonds from the ketone residues produced by these mod- 
ules. The resulting molecule has a polyketide backbone 
consisting of more than 23 carbons. 

The N. bmsiliensis strains had the largest number of 
species-specific clusters among the strains studied in the 
present work: five to six PKS-I clusters (#15, #16, #19, 
#22, #26, #51, #52), six NRPS clusters (#3, #6, #17, #18, 
#21, #32), and two to four PKS-I/NRPS hybrid clusters 
(#9, #10, #12, #23), depending on the strains. Among 
them, three PKS-I clusters (#22, #51, #52) consisted of a 
single module, suggesting that their final or intermediate 
products may be small in accordance with the assembly 
line rule [46,47], unless the modules are used iteratively, 
as has been reported in actinomycetes (e.g., [50,51]). 
Interestingly, PKS-Is in #22 shows 61% sequence similar- 
ity to PksE of Streptomyces griseus [GenBank/EMBL/ 



DDBJ accession number: AA025858], whose product is 
an unusual polyketide compound including a 9-membered 
enediyne core [52]. 

The rest of the clusters with multiple modules may 
possibly produce large species-specific products. In par- 
ticular, nonribosomal peptides produced by clusters #18 
and #21 are, respectively, predicted to consist of nine 
and six (to eight) amino acids. 

Intra-species variations of clusters 

Strain-specific as well as species-specific PKS-I clusters 
were found in N. bmsiliensis strains (#19, #30, #52), indi- 
cating different strains of the same species may potentially 
produce different products. Because there are large num- 
bers of Nocardia strains stored in several bio-resource 
centers (e.g., [5-7]), and are rapidly accumulating [12], 
these Nocardia strains constitute a highly promising fu- 
ture resource for exploring secondary metabolites. 
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Figure 5 Large PKS-I/NRPS hybrid gene clusters found in N. otitidiscaviarum, N. brasiliensis HUJEG-1 and IFM 10847 (Figure 2 #20). 

A; The hybrid gene cluster found in N. otitidiscaviarum. B and C; N. brasiiiensis HUJEG-1 and IFM 10847 have only the left half of the hybrid cluster 
shown in A. A similar cluster in N. brasiliensis NBRC 14402 T is truncated at an edge of a contig, and not shown here. D and E; Clusters similar to 
the one in A are found in Rhodococcus opacus B4 (D) [53], and Rhodococcus jostii RHA1 (E) [54]. Loci of the left clusters and the right two NRPS 
clusters are separated in Rhodococcus genomes. 



Other unique examples of clusters 

Figure 5A shows a module structure of PKS-I/NRPS hy- 
brid cluster #20 (Figure 2) in N. otitidiscaviarum. The 
left half (approximately 8,300 aa) of the cluster has 
high similarity to N. brasiliensis hybrid cluster #20 
(75 - 82% similarities), but the right half (7,746 aa of 
NOTIT_47_07220 plus 9,157 aa of NOTIT_47_07270) 
has top hits to two proteins in Rhodococcus opacus 
(7,746 aa) [GenBank/EMBL/DDBJ accession number: 
YP_002777453] (Figure 5D) and Gordonia aichiensis 
(9,517 aa) [GenBank/EMBL/DDBJ accession number: 
WP_005170336], with 59% (4,667/7,781) and 54% (4,543/ 
8,322) amino acid similarities, respectively. The right-half 
consisting of two NRPSs is widely conserved in Rhodococ- 
cus spp. including 7?. jostii as shown in Figure 5E. 

Because the gene cluster of N. otitidiscaviarum #20 
contains one PKS-I module and 19 NRPS modules, the 
product has been tentatively predicted to include one 
polyketide chain and 19 amino acids (Additional file 4: 
Figure S4). It should also be mentioned, however, that 
the two genes in cluster #20, NOTIT_47_07270 and 
NOTIT 47 07220, each contain thioesterase domains 
(Te) at their C-terminal ends, suggesting another possi- 
bility that the product may contain 12 amino acids in- 
stead of 19. 

All cluster structures we analyzed in the present work 
are listed in (Additional file 1: Table SI). 

Conclusions 

We conclude the following: 1) genomes of Nocardia strains 
carry as many PKS-I and NRPS clusters as Streptomyces 
strains, 2) the number of PKS-I and NRPS gene clusters 
in Nocardia strains varies substantially depending on spe- 
cies, and N. brasiliensis strains carry the largest number of 



clusters among the species studied, 3) the seven Nocardia 
strains studied in the present work have six common 
PKS-I/NRPS clusters, some of whose products are yet to 
be studied, and 4) different N. brasiliensis strains have a 
few different clusters for secondary metabolite synthesis. 
Also, the following are suggested: 1) there is no clear rela- 
tion between genome size and pathogenicity in Nocardia 
strains, e.g. N. farcinica and N. brasiliensis are both preva- 
lent pathogens, but their genome sizes are 6.0 Mb and 
9.4 Mb, respectively, the minimum and maximum among 
the strains studied, and 2) some genes (e.g. cluster #17 
in Figure 2) are likely to have been horizontally trans- 
ferred from (or to) other actinomycetous strains, such as 
Rhodococcus spp. 

To summarize, in this study, we compared complete 
and draft genome sequences of seven strains from five 
representative Nocardia species. The sequences we ob- 
tained provided useful information for inferring numbers 
and molecular structures of secondary metabolites po- 
tentially produced by Nocardia strains. Genome sequen- 
cing revealed the possibility that Nocardia strains are as 
attractive resources as Streptomyces strains, the largest 
resource of natural compounds, in the search for new 
useful secondary metabolites. 

Additional files 



Additional file 1: Table SI. ORFs and module/domain structures of 
PKS-I, NRPS, and PKS-I/NRPS hybrid gene clusters in genomes. Data for 
N. otitidiscaviarum IFM 1 1049, W. asteroides NBRC 15531 T, W. brasiliensis 
NBRC 14402 T and IFM 10847 are shown. 

Additional file 2: Figure SI. Representative NRPS gene clusters in 
N. farcinica and their homologs in other strains. A. N. asteroides has a 
duster with an overall similarity to nfa71 70-7200; but the third ORF, 
NCAST_11_00860, is similar to the first ORF nfa7170, rather than the third 
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ORF nfa7190 of the corresponding position. N. brasiliensis NBRC 14402 T 
and IFM 10847 lack ORFs corresponding to nfa7180. B. N. brasiliensis FM 
10847 has only partial sequences of N. farcinica nfa50330-homologous 
gene, while the homolog in N. otitidiscaviarum is not only partial but also 
distantly located in the genome. C. N. asteroides possesses an nfa50630 
homolog, but lacks an nfa50620 homolog. N. brasiliensis strains have no 
homologs. 

Additional file 3: Figure S2. Comparison of putative polyunsaturated 
fatty acid synthase (PfaA) genes between the genus Nocardia (Figure 2 #1 1) 
and the genus Shewanella. 

Additional file 4: Figure S3. Predicted chemical structure of the 
product from PKS-I/NRPS hybrid gene cluster #20 in N. otitidiscaviarum. 
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